Frequency Calculations


In [1]:
import numpy as np

In [2]:
with open('qm_files/drop_0375_0qm_0mm.out') as f:
    contents_qmoutput = f.read()

Advanced: calculate frequencies directly from the mass-weighted Hessian

How can you prove to yourself that the frequencies printed at the end of a QM output are correct? We can take the Hessian (mass-weighted or otherwise), which is the second derivative of the energy with respect to nuclear displacements,

$H_{ij} = \frac{\partial^{2} E}{\partial x_{i} \partial x_{j}}$,

diagonalize it, and multiply the eigenvalues by some constants to obtain vibrational (normal mode) frequencies in wavenumbers.

A useful resource for understanding this is given by http://sirius.chem.vt.edu/wiki/doku.php?id=crawdad:programming:project2, which most of this section is derived from.

First, we need to locate the Hessian. If we can only find one that isn't mass-weighted, we'll need to do that too.


In [3]:
contents_splitlines = contents_qmoutput.splitlines()

In [4]:
contents_splitlines[340:370]


Out[4]:
[' Direct stationary perturbation theory relativistic correction:',
 ' ',
 ' rels  =       0.068935909736',
 ' relv  =      -0.274784382948',
 ' rel2e =       0.090968803792',
 ' E_rel =      -0.114879669420',
 ' ',
 ' Hessian of the SCF Energy',
 '            1           2           3           4           5           6',
 '    1   1.0247744  -0.8068231  -0.5202480  -0.5826944   0.4191537   0.2742436',
 '    2  -0.8068231   0.8607577   0.4729366   0.4286992  -0.4016804  -0.2244742',
 '    3  -0.5202480   0.4729366   0.4306138   0.2796608  -0.2238104  -0.2051113',
 '    4  -0.5826944   0.4286992   0.2796608   0.6023285  -0.4739293  -0.3091596',
 '    5   0.4191537  -0.4016804  -0.2238104  -0.4739293   0.4119744   0.2535997',
 '    6   0.2742436  -0.2244742  -0.2051113  -0.3091596   0.2535997   0.1885765',
 '    7  -0.4420800   0.3781240   0.2405872  -0.0196341   0.0547757   0.0349160',
 '    8   0.3876694  -0.4590773  -0.2491262   0.0452302  -0.0102940  -0.0291255',
 '    9   0.2460043  -0.2484624  -0.2255025   0.0294988  -0.0297893   0.0165348',
 '            7           8           9',
 '    1  -0.4420800   0.3876694   0.2460043',
 '    2   0.3781240  -0.4590773  -0.2484624',
 '    3   0.2405872  -0.2491262  -0.2255025',
 '    4  -0.0196341   0.0452302   0.0294988',
 '    5   0.0547757  -0.0102940  -0.0297893',
 '    6   0.0349160  -0.0291255   0.0165348',
 '    7   0.4617141  -0.4328996  -0.2755032',
 '    8  -0.4328996   0.4693714   0.2782517',
 '    9  -0.2755032   0.2782517   0.2089677',
 ' Gradient time:  CPU 21.70 s  wall 21.71 s',
 ' **********************************************************************']

There's a line Hessian of the SCF Energy, which sounds like it might not be mass-weighted. First, let's try and extract it.


In [5]:
# lists are iterable, but not `iterators` themselves.
contents_iter = iter(contents_splitlines)
for line in contents_iter:
    if 'Hessian of the SCF Energy' in line:
        while 'Gradient time:' not in line:
            print(line)
            line = next(contents_iter)


 Hessian of the SCF Energy
            1           2           3           4           5           6
    1   1.0247744  -0.8068231  -0.5202480  -0.5826944   0.4191537   0.2742436
    2  -0.8068231   0.8607577   0.4729366   0.4286992  -0.4016804  -0.2244742
    3  -0.5202480   0.4729366   0.4306138   0.2796608  -0.2238104  -0.2051113
    4  -0.5826944   0.4286992   0.2796608   0.6023285  -0.4739293  -0.3091596
    5   0.4191537  -0.4016804  -0.2238104  -0.4739293   0.4119744   0.2535997
    6   0.2742436  -0.2244742  -0.2051113  -0.3091596   0.2535997   0.1885765
    7  -0.4420800   0.3781240   0.2405872  -0.0196341   0.0547757   0.0349160
    8   0.3876694  -0.4590773  -0.2491262   0.0452302  -0.0102940  -0.0291255
    9   0.2460043  -0.2484624  -0.2255025   0.0294988  -0.0297893   0.0165348
            7           8           9
    1  -0.4420800   0.3876694   0.2460043
    2   0.3781240  -0.4590773  -0.2484624
    3   0.2405872  -0.2491262  -0.2255025
    4  -0.0196341   0.0452302   0.0294988
    5   0.0547757  -0.0102940  -0.0297893
    6   0.0349160  -0.0291255   0.0165348
    7   0.4617141  -0.4328996  -0.2755032
    8  -0.4328996   0.4693714   0.2782517
    9  -0.2755032   0.2782517   0.2089677

The Hessian matrix should have $3N * 3N$ elements, where $N$ is the number of atoms; this is because each atom contributes 3 Cartesian coordinates, and it is two-dimensional because of the double derivative. By this logic, the gradient may be represented by a length $3N$ vector, and cubic force constants by an array of shape $[3N, 3N, 3N]$.

Now that we can extract only the lines that contain the Hessian information, let's make a NumPy array that will hold the final results.


In [6]:
N = 3
hessian_scf = np.zeros(shape=(3*N, 3*N))

Here comes the tricky Python bits. For matrices, Q-Chem prints a maximum of 6 columns per block; this info is necessary for keeping track of where we are during parsing.


In [7]:
ncols_block = 6

Here's the machinery that will do the actual iteration over the matrix, leaving out the part where the results are stored.


In [8]:
contents_iter = iter(contents_splitlines)

for line in contents_iter:
    if 'Hessian of the SCF Energy' in line:
        # skip the line containing the header
        line = next(contents_iter)
        
        # keep track of how many "columns" there are left; better than searching
        # for the gradient line
        ncols_remaining = 3*N
        while ncols_remaining > 0:
            # this must be the column header
            if len(line.split()) == min(ncols_block, ncols_remaining):
                print(line)
                ncols_remaining -= ncols_block
                sline = line.split()
                colmin, colmax = int(sline[0]) - 1, int(sline[-1])
                line = next(contents_iter)
            # iterate over the rows
            for row in range(3*N):
                print(line)
                line = next(contents_iter)


            1           2           3           4           5           6
    1   1.0247744  -0.8068231  -0.5202480  -0.5826944   0.4191537   0.2742436
    2  -0.8068231   0.8607577   0.4729366   0.4286992  -0.4016804  -0.2244742
    3  -0.5202480   0.4729366   0.4306138   0.2796608  -0.2238104  -0.2051113
    4  -0.5826944   0.4286992   0.2796608   0.6023285  -0.4739293  -0.3091596
    5   0.4191537  -0.4016804  -0.2238104  -0.4739293   0.4119744   0.2535997
    6   0.2742436  -0.2244742  -0.2051113  -0.3091596   0.2535997   0.1885765
    7  -0.4420800   0.3781240   0.2405872  -0.0196341   0.0547757   0.0349160
    8   0.3876694  -0.4590773  -0.2491262   0.0452302  -0.0102940  -0.0291255
    9   0.2460043  -0.2484624  -0.2255025   0.0294988  -0.0297893   0.0165348
            7           8           9
    1  -0.4420800   0.3876694   0.2460043
    2   0.3781240  -0.4590773  -0.2484624
    3   0.2405872  -0.2491262  -0.2255025
    4  -0.0196341   0.0452302   0.0294988
    5   0.0547757  -0.0102940  -0.0297893
    6   0.0349160  -0.0291255   0.0165348
    7   0.4617141  -0.4328996  -0.2755032
    8  -0.4328996   0.4693714   0.2782517
    9  -0.2755032   0.2782517   0.2089677

An important note: the Hessian (and all other matrices in the output) start from 1, but Python indices for lists and arrays start at 0; this is why int() - 1 shows up. But why not for colmax? Because of how Python slicing for lists and arrays doesn't include the ending number. (Elaborate?)


In [9]:
contents_iter = iter(contents_splitlines)

for line in contents_iter:
    if 'Hessian of the SCF Energy' in line:
        # skip the line containing the header
        line = next(contents_iter)
        
        # keep track of how many "columns" there are left; better than searching
        # for the gradient line
        ncols_remaining = 3*N
        while ncols_remaining > 0:
            # this must be the column header
            if len(line.split()) == min(ncols_block, ncols_remaining):
                ncols_remaining -= ncols_block
                sline = line.split()
                colmin, colmax = int(sline[0]) - 1, int(sline[-1])
                line = next(contents_iter)
            # iterate over the rows
            for row in range(3*N):
                sline = line.split()
                rowidx = int(sline[0]) - 1
                hessian_scf[rowidx, colmin:colmax] = list(map(float, sline[1:]))
                line = next(contents_iter)

In [10]:
hessian_scf


Out[10]:
array([[ 1.0247744, -0.8068231, -0.520248 , -0.5826944,  0.4191537,
         0.2742436, -0.44208  ,  0.3876694,  0.2460043],
       [-0.8068231,  0.8607577,  0.4729366,  0.4286992, -0.4016804,
        -0.2244742,  0.378124 , -0.4590773, -0.2484624],
       [-0.520248 ,  0.4729366,  0.4306138,  0.2796608, -0.2238104,
        -0.2051113,  0.2405872, -0.2491262, -0.2255025],
       [-0.5826944,  0.4286992,  0.2796608,  0.6023285, -0.4739293,
        -0.3091596, -0.0196341,  0.0452302,  0.0294988],
       [ 0.4191537, -0.4016804, -0.2238104, -0.4739293,  0.4119744,
         0.2535997,  0.0547757, -0.010294 , -0.0297893],
       [ 0.2742436, -0.2244742, -0.2051113, -0.3091596,  0.2535997,
         0.1885765,  0.034916 , -0.0291255,  0.0165348],
       [-0.44208  ,  0.378124 ,  0.2405872, -0.0196341,  0.0547757,
         0.034916 ,  0.4617141, -0.4328996, -0.2755032],
       [ 0.3876694, -0.4590773, -0.2491262,  0.0452302, -0.010294 ,
        -0.0291255, -0.4328996,  0.4693714,  0.2782517],
       [ 0.2460043, -0.2484624, -0.2255025,  0.0294988, -0.0297893,
         0.0165348, -0.2755032,  0.2782517,  0.2089677]])

The SCF Hessian has successfully been parsed from the output file and stored in a NumPy array. Since it needs to be mass-weighted, we need the masses of each atom. They're printed at the very end of the output file, and extracting them should be easier than for the Hessian.


In [11]:
contents_iter = iter(contents_splitlines)

for line in contents_iter:
    if 'Zero point vibrational energy:' in line:
        line = next(contents_iter)
        line = next(contents_iter)
        while 'Molecular Mass:' not in line:
            print(line.split())
            line = next(contents_iter)


['Atom', '1', 'Element', 'C', 'Has', 'Mass', '12.00000']
['Atom', '2', 'Element', 'O', 'Has', 'Mass', '15.99491']
['Atom', '3', 'Element', 'O', 'Has', 'Mass', '15.99491']

In [12]:
masses = []

contents_iter = iter(contents_splitlines)

for line in contents_iter:
    if 'Zero point vibrational energy:' in line:
        line = next(contents_iter)
        line = next(contents_iter)
        while 'Molecular Mass:' not in line:
            masses.append(float(line.split()[-1]))
            line = next(contents_iter)

In [13]:
masses


Out[13]:
[12.0, 15.99491, 15.99491]

Hopefully alarm bells are going off in your head right about now. Units! What units are we working in? Units units units!

The masses above are in amu (https://en.wikipedia.org/wiki/Atomic_mass_unit).

The SCF Hessian is certainly in atomic units (https://en.wikipedia.org/wiki/Atomic_units). Internally, all quantum chemistry programs work in atomic units. From Wikipedia:

This article deals with Hartree atomic units, where the numerical values of the following four fundamental physical constants are all unity by definition:

  • Electron mass $m_\text{e}$;
  • Elementary charge $e$;
  • Reduced Planck's constant $\hbar = h/(2 \pi)$;
  • Coulomb's constant $k_\text{e} = 1/(4 \pi \epsilon_0)$.

Since the Hessian is the double derivative of the energy with respect to nuclear diplacements (which can be considered positions or lengths), it will have units of $[\textrm{energy}][\textrm{length}]^{-2}$, which in atomic units is $E_{\textrm{h}}/a_{0}^{2}$ (hartree per bohr squared).

Now to mass-weight the SCF Hessian. The ordering along each dimension of the matrix is first atom x-coord, first atom y-coord, first atom z-coord, second atom x-coord, and so on. This means that every three columns or rows, we switch to a new atom. We'll be able to use integer division (// 3) to accomplish this cleanly.


In [14]:
# First, make a copy of the original Hessian array so we don't modify that one.
hessian_mass_weighted = hessian_scf.copy()

# We know the dimension a priori to be 3*N, but what if we don't? Use an array attribute!
shape = hessian_mass_weighted.shape
# shape is a tuple, here we "unpack" it.
nrows, ncols = shape

In [15]:
import math

In [16]:
for i in range(nrows):
    for j in range(ncols):
        _denom = math.sqrt(masses[i // 3] * masses[j // 3])
        hessian_mass_weighted[i, j] /= _denom

In [17]:
hessian_mass_weighted


Out[17]:
array([[ 0.08539787, -0.06723526, -0.043354  , -0.04205904,  0.03025463,
         0.01979498, -0.03190945,  0.02798208,  0.01775666],
       [-0.06723526,  0.07172981,  0.03941138,  0.03094362, -0.0289934 ,
        -0.01620261,  0.02729309, -0.03313632, -0.01793408],
       [-0.043354  ,  0.03941138,  0.03588448,  0.02018599, -0.01615469,
        -0.01480499,  0.01736565, -0.017982  , -0.01627683],
       [-0.04205904,  0.03094362,  0.02018599,  0.03765751, -0.02963001,
        -0.01932862, -0.00122752,  0.00282779,  0.00184426],
       [ 0.03025463, -0.0289934 , -0.01615469, -0.02963001,  0.02575659,
         0.01585503,  0.00342457, -0.00064358, -0.00186242],
       [ 0.01979498, -0.01620261, -0.01480499, -0.01932862,  0.01585503,
         0.01178978,  0.00218294, -0.00182092,  0.00103375],
       [-0.03190945,  0.02729309,  0.01736565, -0.00122752,  0.00342457,
         0.00218294,  0.02886631, -0.02706484, -0.01722443],
       [ 0.02798208, -0.03313632, -0.017982  ,  0.00282779, -0.00064358,
        -0.00182092, -0.02706484,  0.02934505,  0.01739627],
       [ 0.01775666, -0.01793408, -0.01627683,  0.00184426, -0.00186242,
         0.00103375, -0.01722443,  0.01739627,  0.01306464]])

We're ready to diagonalize the mass-weighted Hessian. If $\textrm{F}^{M}$ is the mass-weighted Hessian matrix, then the eigensystem defined by

$\textrm{F}^{M} \textbf{L} = \textbf{L} \Lambda$

results in the eigenvectors $\textbf{L}$, which are the normal modes of the system (the displacement vectors for each vibration, orthogonal to each other), and the eigenvalues $\Lambda$, which are the frequencies associated with each normal mode.


In [18]:
import scipy.linalg as spl

In [19]:
eigvals, eigvecs = spl.eig(hessian_mass_weighted)

In [20]:
print(eigvals)


[  2.36146632e-01+0.j   7.52625239e-02+0.j   1.46285696e-02+0.j
   1.44688883e-02+0.j  -5.00310212e-04+0.j  -5.14261190e-04+0.j
   6.15887396e-09+0.j  -4.32646978e-09+0.j   4.40868351e-10+0.j]

The Hessian is Hermitian, so we could use spl.eigh, and not return any complex values, but they're useful for illustrating a point to be made later on.

The eigenvalues of the mass-weighted Hessian will have the same units as the mass-weighted Hessian itself, which are ($[\textrm{energy}][\textrm{length}]^{-2}[\textrm{mass}]^{-1}$), or $\frac{E_{\textrm{h}}}{a_{0}^{2}m_{\textrm{e}}}$. These need to be converted to wavenumbers.

I'll save everyone the trouble of doing this conversion; I've done it once and that was enough.


In [21]:
bohr2m = 0.529177249e-10
hartree2joule = 4.35974434e-18
speed_of_light = 299792458
avogadro = 6.0221413e+23

In [22]:
vib_constant = math.sqrt((avogadro*hartree2joule*1000)/(bohr2m*bohr2m))/(2*math.pi*speed_of_light*100)
vib_constant


Out[22]:
5140.486777894163

The vibrational frequencies are related to the square roots of the eigenvalues, multiplied by the above constant:

$\omega_{i} = \textrm{constant} \times \sqrt{\lambda_{i}}$


In [23]:
vibfreqs = np.sqrt(eigvals) * vib_constant
vibfreqs


Out[23]:
array([  2.49801547e+03  +0.j        ,   1.41024197e+03  +0.j        ,
         6.21734801e+02  +0.j        ,   6.18332151e+02  +0.j        ,
         0.00000000e+00+114.98043046j,   0.00000000e+00+116.57250297j,
         4.03417659e-01  +0.j        ,   0.00000000e+00  +0.33812017j,
         1.07934108e-01  +0.j        ])

Interesting! Some of our frequencies are imaginary, which NumPy handles properly (math.sqrt(-1.0) would throw an error). What does this mean?

The eigenvalues of the full molecular Hessian contain not just vibrational frequencies, but those corresponding to rotational and translational motion (though one would want to convert those to periods from frequencies). For non-linear molecules, this results in $3N-6$ vibrations, after subtracting out the 3 translations and 3 rotations. For linear molecules, this is $3N-5$.

Imaginary frequencies would normally mean we aren't at a stationary point on the potential energy surface (PES), but notice that these imaginary frequencies will be removed once we take into account translational and rotational motion. They could be zeroed out completely if one projects out translations and rotations from the Hessian, which we won't do for now.


In [24]:
vibfreqs_calculated = vibfreqs[:(3*N)-5].real
vibfreqs_calculated


Out[24]:
array([ 2498.01547492,  1410.24196992,   621.73480085,   618.33215072])

How do the frequencies we've calculated compare with those printed in the output file?


In [25]:
vibfreqs_printed = [621.29, 1410.25, 2498.02]
vibfreqs_printed.reverse()

In [26]:
%matplotlib inline
import matplotlib.pyplot as plt

In [27]:
fig, ax = plt.subplots()

xticks = range(len(vibfreqs_printed))

ax.plot(xticks, vibfreqs_calculated[:-1], label='calculated', marker='o')
ax.plot(xticks, vibfreqs_printed, label='printed', marker='o')

ax.set_ylabel(r'frequency ($\mathrm{cm}^{-1}$)')

ax.legend(loc='best', fancybox=True)


Out[27]:
<matplotlib.legend.Legend at 0x7f44bc1eab00>

The plot doesn't tell us very much, since the differences are all < 1 $\textrm{cm}^{-1}$:


In [28]:
vibfreqs_calculated[:-1] - vibfreqs_printed


Out[28]:
array([-0.00452508, -0.00803008,  0.44480085])

So far, so good; however, why would there be any discrepancies at all?

  1. Numerical precision regarding constants used. The constants above might have more or fewer decimal places used than the QM package.
  2. Numerical precision regarding the printed Hessian. The human-readable output is limited to 6 decimal places; machine-precision for doubles is ~1e-16.
  3. Numerical precision regarding the atomic masses used. Ours didn't have many decimal places.
  4. Contamination of the Hessian due to translations and rotations. Q-Chem projects these out of the Hessian before calculating the vibrational frequencies, but doesn't print this matrix unless vibman_print gets increased in the input file.

There is one more discrepancy. Notice that there are only 3 printed values, but 4 calculated values? Hmm...